Search for: All records

Creators/Authors contains: "Huang, Yuxin"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Modelling the cosmic dispersion measure in the D < 120 Mpc Local Universe

https://doi.org/10.1093/mnras/staf417

Huang, Yuxin; Lee, Khee-Gan; Libeskind, Noam I; Simha, Sunil; Valade, Aurélien; Prochaska, J Xavier (March 2025, Monthly Notices of the Royal Astronomical Society)

ABSTRACT The Local Universe ($D< 120$ Mpc) has been intensely studied for decades, with highly complete galaxy redshift surveys now publicly available. These data have driven density reconstructions of the underlying matter density field, as well as constrained simulations that aim to reproduce the observed structures. In this paper, we introduce a dispersion measure (DM) model that makes use of this detailed knowledge of our Local Universe within $D< 120$ Mpc. The model comprises three key components: (i) the DM from the Milky Way’s halo and the intragroup medium (up to 3.4 Mpc), derived from the H estia simulations, a series of constrained hydrodynamic simulations designed to reproduce our Local Group; (ii) the DM contribution from the large-scale intergalactic medium beyond the Local Group (3.4 Mpc $< D< 120$ Mpc), calculated using the Hamlet reconstructed matter density field; and (iii) the individual DM contributions from Local Universe galaxy haloes and clusters based on data from the Two Micron All Sky Survey Galaxy Group Catalogue and the NASA/IPAC Extragalactic Data base. This comprehensive model will be made available as a python package. As the most realistic model to date for DM in the local volume, it promises to improve the constraints of DM contributions from the intergalactic medium and circumgalactic medium of fast radio bursts (FRBs), thereby enhancing the accuracy of cosmic baryon distribution calculations based on DM analysis of FRBs.
more » « less
Free, publicly-accessible full text available March 26, 2026
FRB Line-of-sight Ionization Measurement from Lightcone AAOmega Mapping Survey: The First Data Release

https://doi.org/10.3847/1538-4365/adbc7f

Huang, Yuxin; Simha, Sunil; Khrykin, Ilya S; Lee, Khee-Gan; Prochaska, J Xavier; Tejos, Nicolas; Bannister, Keith W; Barrios, Jason; Chisholm, John; Cooke, Jeff; et al (April 2025, The Astrophysical Journal Supplement Series)

Abstract This paper presents the first public data release (DR1) of the FRB Line-of-sight Ionization Measurement From Lightcone AAOmega Mapping (FLIMFLAM) survey, a wide field spectroscopic survey targeted on the fields of 10 precisely localized fast radio bursts (FRBs). DR1 encompasses spectroscopic data for 10,468 galaxy redshifts across 10 FRB fields withz < 0.4, covering approximately 26 deg²of the sky in total. FLIMFLAM is composed of several layers, encompassing the “wide” (covering ∼degree or >10 Mpc scales), “narrow” (several arcminutes or ∼Mpc), and integral field unit (“IFU”; ∼arcminute or ∼100 kpc) components. The bulk of the data comprises spectroscopy from the Two Degree Field-AAOmega instrument on the 3.9 m Anglo-Australian Telescope, while most of the narrow and IFU data was achieved using an ensemble of 8–10 m class telescopes. We summarize the information on our selected FRB fields, the criteria for target selection, methodologies employed for data reduction, spectral analysis processes, and an overview of our data products. An evaluation of our data reveals an average spectroscopic completeness of 48.43%, with over 80% of the observed targets having secure redshifts. Additionally, we describe our approach to generating angular masks and calculating the target selection functions, setting the stage for the impending reconstruction of the matter density field.
more » « less
Free, publicly-accessible full text available April 1, 2026
FLIMFLAM DR1: The First Constraints on the Cosmic Baryon Distribution from Eight Fast Radio Burst Sight Lines

https://doi.org/10.3847/1538-4357/ad6567

Khrykin, Ilya_S; Ata, Metin; Lee, Khee-Gan; Simha, Sunil; Huang, Yuxin; Prochaska, J_Xavier; Tejos, Nicolas; Bannister, Keith_W; Cooke, Jeff; Day, Cherie_K; et al (September 2024, The Astrophysical Journal)

Abstract The dispersion measure of fast radio bursts (FRBs), arising from the interactions with free electrons along the propagation path, constitutes a unique probe of the cosmic baryon distribution. Their constraining power is further enhanced in combination with observations of the foreground large-scale structure and intervening galaxies. In this work, we present the first constraints on the partition of the cosmic baryons between the intergalactic medium (IGM) and circumgalactic medium (CGM), inferred from the FLIMFLAM spectroscopic survey. In its first data release, the FLIMFLAM survey targeted galaxies in the foreground of eight localized FRBs. Using Bayesian techniques, we reconstruct the underlying ∼Mpc-scale matter density field that is traced by the IGM gas. Simultaneously, deeper spectroscopy of intervening foreground galaxies (at impact parametersb_⊥≲r₂₀₀) and the FRB host galaxies constrains the contribution from the CGM. Applying Bayesian parameter inference to our data and assuming a fiducial set of priors, we infer the IGM cosmic baryon fraction to be $f_{igm} = {0.59}_{- 0.10}^{+ 0.11}$ and a CGM gas fraction of $f_{gas} = {0.55}_{- 0.29}^{+ 0.26}$ for 10¹⁰M_⊙≲M_halo≲ 10¹³M_⊙halos. The mean FRB host dispersion measure (rest-frame) in our sample is $〈 {DM}_{host} 〉 = 90_{- 19}^{+ 29} pc {cm}^{- 3}$ , of which $〈 {DM}_{host}^{unk} 〉 = 69_{- 19}^{+ 28} pc {cm}^{- 3}$ arises from the host galaxy interstellar medium (ISM) and/or the FRB progenitor environment. While our currentf_igmandf_gasuncertainties are too broad to constrain most galactic feedback models, this result marks the first measurement of the IGM and CGM baryon fractions, as well as the first systematic separation of the FRB host dispersion measure into two components: arising from the halo and from the inner ISM/FRB engine.
more » « less
The FRB 20190520B Sight Line Intersects Foreground Galaxy Clusters

https://doi.org/10.3847/2041-8213/acefb5

Lee, Khee-Gan; Khrykin, Ilya S.; Simha, Sunil; Ata, Metin; Huang, Yuxin; Prochaska, J. Xavier; Tejos, Nicolas; Cooke, Jeff; Nagamine, Kentaro; Zhang, Jielai (August 2023, The Astrophysical Journal Letters)

Abstract The repeating fast radio burst FRB 20190520B is an anomaly of the FRB population thanks to its high dispersion measure (DM = 1205 pc cm⁻³) despite its low redshift ofz_frb= 0.241. This excess has been attributed to a large host contribution of DM_host≈ 900 pc cm⁻³, far larger than any other known FRB. In this paper, we describe spectroscopic observations of the FRB 20190520B field obtained as part of the FLIMFLAM survey, which yielded 701 galaxy redshifts in the field. We find multiple foreground galaxy groups and clusters, for which we then estimated halo masses by comparing their richness with numerical simulations. We discover two separateM_halo> 10¹⁴M_⊙galaxy clusters atz= 0.1867 and 0.2170 that are directly intersected by the FRB sight line within their characteristic halo radiusr₂₀₀. Subtracting off their estimated DM contributions, as well that of the diffuse intergalactic medium, we estimate a host contribution of ${D M}_{h o s t} = 430_{- 220}^{+ 140}$ or $280_{- 170}^{+ 140} p c {c m}^{- 3}$ (observed frame), depending on whether we assume that the halo gas extends tor₂₀₀or 2 ×r₂₀₀. This significantly smaller DM_host—no longer the largest known value—is now consistent with Hαemission measures of the host galaxy without invoking unusually high gas temperatures. Combined with the observed FRB scattering timescale, we estimate the turbulent fluctuation and geometric amplification factor of the scattering layer to be $\tilde{F} G \approx 4.5 - 11 {({pc}^{2} km)}^{- 1 / 3}$ , suggesting that most of the gas is close to the FRB host. This result illustrates the importance of incorporating foreground data for FRB analyses both for understanding the nature of FRBs and to realize their potential as a cosmological probe.
more » « less
Searching for the Sources of Excess Extragalactic Dispersion of FRBs

https://doi.org/10.3847/1538-4357/ace324

Simha, Sunil; Lee, Khee-Gan; Prochaska, J_Xavier; Khrykin, Ilya_S; Huang, Yuxin; Tejos, Nicolas; Marnoch, Lachlan; Ata, Metin; Bernales, Lucas; Bhandari, Shivani; et al (August 2023, The Astrophysical Journal)

Abstract The FLIMFLAM survey is collecting spectroscopic data of field galaxies near fast radio burst (FRB) sight lines to constrain key parameters describing the distribution of matter in the Universe. In this work, we leverage the survey data to determine the source of the excess extragalactic dispersion measure (DM), compared to Macquart relation estimates of four FRBs: FRB20190714A, FRB20200906A, FRB20200430A, and FRB20210117A. By modeling the gas distribution around the foreground galaxy halos and galaxy groups of the sight lines, we estimate DM_halos, their contribution to the FRB DMs. The FRB20190714A sight line shows a clear excess of foreground halos which contribute roughly two-thirds of the observed excess DM, thus implying a sight line that is baryon dense. FRB20200906A shows a smaller but nonnegligible foreground halo contribution, and further analysis of the intergalactic medium is necessary to ascertain the true cosmic contribution to its DM. FRB20200430A and FRB20210117A show negligible foreground contributions, implying a large host galaxy excess and/or progenitor environment excess.
more » « less
Towards Understanding Gender Bias in Relation Extraction

https://doi.org/10.18653/v1/2020.acl-main.265

Gaut, Andrew; Sun, Tony; Tang, Shirlyn; Huang, Yuxin; Qian, Jing; ElSherief, Mai; Zhao, Jieyu; Mirza, Diba; Belding, Elizabeth; Chang, Kai-Wei; et al (January 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics)

Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse-of and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birthDate or birthPlace. We also analyze how existing bias mitigation techniques, such as name anonymization, word embedding debiasing, and data augmentation affect the NRE system in terms of maintaining the test performance and reducing biases. Unfortunately, due to NRE models rely heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in NRE.
more » « less
Full Text Available
Mitigating Gender Bias in Natural Language Processing: Literature Review

Sun, Tony; Gaut, Andrew; Tang, Shirlyn; Huang, Yuxin; ElSherief, Mai; Zhao, Jieyu; Mirza, Diba; Belding, Elizabeth; Chang, Kai-Wei; Wang, William Yang (July 2019, Association for Computational Linguistics (ACL 2019))

As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
more » « less
Full Text Available
Challenges of COVID-19 Case Forecasting in the US, 2020–2021

https://doi.org/10.1371/journal.pcbi.1011200

Lopez, Velma K; Cramer, Estee Y; Pagano, Robert; Drake, John M; O’Dea, Eamon B; Adee, Madeline; Ayer, Turgay; Chhatwal, Jagpreet; Dalgic, Ozden O; Ladd, Mary A; et al (May 2024, PLOS Computational Biology)
Larremore, Daniel B (Ed.)
During the COVID-19 pandemic, forecasting COVID-19 trends to support planning and response was a priority for scientists and decision makers alike. In the United States, COVID-19 forecasting was coordinated by a large group of universities, companies, and government entities led by the Centers for Disease Control and Prevention and the US COVID-19 Forecast Hub (https://covid19forecasthub.org). We evaluated approximately 9.7 million forecasts of weekly state-level COVID-19 cases for predictions 1–4 weeks into the future submitted by 24 teams from August 2020 to December 2021. We assessed coverage of central prediction intervals and weighted interval scores (WIS), adjusting for missing forecasts relative to a baseline forecast, and used a Gaussian generalized estimating equation (GEE) model to evaluate differences in skill across epidemic phases that were defined by the effective reproduction number. Overall, we found high variation in skill across individual models, with ensemble-based forecasts outperforming other approaches. Forecast skill relative to the baseline was generally higher for larger jurisdictions (e.g., states compared to counties). Over time, forecasts generally performed worst in periods of rapid changes in reported cases (either in increasing or decreasing epidemic phases) with 95% prediction interval coverage dropping below 50% during the growth phases of the winter 2020, Delta, and Omicron waves. Ideally, case forecasts could serve as a leading indicator of changes in transmission dynamics. However, while most COVID-19 case forecasts outperformed a naïve baseline model, even the most accurate case forecasts were unreliable in key phases. Further research could improve forecasts of leading indicators, like COVID-19 cases, by leveraging additional real-time data, addressing performance across phases, improving the characterization of forecast confidence, and ensuring that forecasts were coherent across spatial scales. In the meantime, it is critical for forecast users to appreciate current limitations and use a broad set of indicators to inform pandemic-related decision making.
more » « less
Full Text Available
The United States COVID-19 Forecast Hub dataset

https://doi.org/10.1038/s41597-022-01517-w

Cramer, Estee Y.; Huang, Yuxin; Wang, Yijin; Ray, Evan L.; Cornell, Matthew; Bracher, Johannes; Brennen, Andrea; Rivadeneira, Alvaro J.; Gerding, Aaron; House, Katie; et al (December 2022, Scientific Data)

Abstract Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages.
more » « less
Full Text Available
Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States

https://doi.org/10.1073/pnas.2113561119

Cramer, Estee Y.; Ray, Evan L.; Lopez, Velma K.; Bracher, Johannes; Brennen, Andrea; Castro Rivadeneira, Alvaro J.; Gerding, Aaron; Gneiting, Tilmann; House, Katie H.; Huang, Yuxin; et al (April 2022, Proceedings of the National Academy of Sciences)

Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub ( https://covid19forecasthub.org/ ) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.
more » « less
Full Text Available